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Abstract 



The following two decision problems capture the complexity of comparing integers or ratio- 
nals that are succinctly represented in product-of-exponentials notation, or equivalently, via 
arithmetic circuits using only multiplication & division gates, and integer inputs: 

Input instance: four lists of positive integers: 

Qj; ai,...,a„eN!^; 5i, . . . , 6„ e N^; ci, . . . ,c„, G N!p; di, . . . , 4„ G N!p; 

rj ' where each of the integers is represented in binary. 

Q ' Problem 1 (equality testing): Decide whether aia2 



^1 C^2 rf„ 

1 Co ... C^„ 



-1 C2 

fvq ' Problem 2 (inequality testing): Decide whether a-^^a2^ ■ ■ ■ a'^ > Cj^^Cj^ . . . cf^ 

>' 
0^ ' Problem 1 is easily decidable in polynomial time using a simple iterative algorithm. Problem 

2 is much harder. We observe that the complexity of Problem 2 is intimately connected to deep 

conjectures and results in number theory. In particular, if a refined form of the ABC conjecture 

formulated by Baker in 1998 holds, or if the older Lang-Waldschmidt conjecture (formulated in 

1978) on linear forms in logarithms holds, then Problem 2 is decidable in P-time (in the standard 

Turing model of computation). Moreover, it follows from the best available quantitative bounds 

on linear forms in logarithms, e.g., by Baker and Wiistholz [4] or Matveev [19j, that if m and 

n are fixed universal constants then Problem 2 is decidable in P-time (without relying on any 

conjectures) . 

J^ , We describe one application: P-time maximum probability parsing for arbitrary stochastic 

JH ' context-free grammars (where e-rules are allowed). 

1 Introduction 

For many computations involving large integers, or large/small non-zero rationale, it is convenient 
to be able to manipulate and compare the numbers without having to compute a standard binary 
representation of them. Indeed, in many settings it is intractable to compute such a binary repre- 
sentation. This has motivated compact representations such as classic floating point, but floating 
point numbers also suffer a loss of information (precision), which one would like to avoid if possible. 
There are a number of succinct representations one could consider for such a purpose. One ap- 
proach is to represent integers via arithmetic circuits (straight-line programs), with gates {-|-, — , *}, 
and with integer inputs (represented in binary). However, the problem of deciding whether an in- 
teger represented via an arithmetic circuit is > another such integer, referred to as PosSLP [T], 



captures all of polynomial time in the unit-cost arithmetic RAM model of computation. The best 
complexity upper bounds we know for PosSLP in the standard Turing model of computation is the 
4th level of the counting hierarchy, pPP^^ , as established by Allender, Biirgisser, Kjeldgaard- 
Pedersen, and Miltersen in [l]. PosSLP subsumes other hard problems whose complexity is open, 
like the well-known square-root- swn problem ([H]), and it appears highly unlikely that one could 
show that PosSLP is even in NP. On the other hand, as noted in [T], the problem of testing equality 
of integers represented by two such arithmetic circuits, EquSLP, is P-time equivalent to polyno- 
mial identity testing, and can be decided in coRP. However, it remains open whether EquSLP is in 
NP and showing this would already imply hard circuit lower bounds ([H]), so it is likely to be diffi- 
cult. On the other hand, there are no hardness results known for PosSLP with respect to standard 
complexity classes, beyond P-hardness. In particular, PosSLP is not known to be NP-hard. 

Note that if the arithmetic in the computation is confined to only linear operations {-|-, — } on 
registers, and multiplication by constants in the input, then the encoding length of the numbers 
is linear in the number of arithmetic operations, so we can represent all the numbers exactly, and 
P-time in the Turing model can simulate polynomial time unit-cost linear arithmetic computation. 

Now consider another natural restricted class of arithmetic circuits, which turn out to be useful 
in a number of settings: arithmetic circuits containing only gates {*,/}• An essentially equivalent 
representation is the following: For a list of rational numbers a = (ai, . . . , a„), and a list of integers 
b = (61, ... , bn), both of dimension n, we use a as a shorthand notation for: a^^a2^ ■ ■ ■ a„ . We shall 
refer to this succinct representation of integers and rationals as product of exponentials (PoE) 
representation. PoE representation is easily seen to be equivalent to representation via arithmetic 
circuits with integer inputs given in binary and with multiplication and division gates {*,/}• The 
following is shown in the appendix. 

Proposition 1.1. There is a simple P-time translation from a given number represented in PoE to 
the same number represented as an arithmetic circuit over {*,/} with integer inputs (represented 
in binary). Likewise, there is a simple P-time translation in the other direction. 

Consider the problem of deciding whether one rational number, a , given in PoE representation, 
is greater than (or, respectively, equal to) another rational number c*^, given in PoE. We remark 
that, again, the inequality testing problem basically captures the power of polynomial time in the 
unit-cost arithmetic RAM model of computation, where the only arithmetic operations permitted 
are {*,/}. 
Input instance: four lists of positive integers: 

ai,...,an€ni; 61, . . . , 6„ G N!^; ci, . . . , c^ G N^; di, . . . , d^ G N!p; 
where each of the integers is represented, as usual, in binary. 
Problem 1 (equality testing): Decide whether or not a = c . 
Problem 2 (inequality testing): Decide whether or not a* > c'^. 

Note that, by rearrangement, these problems are equivalent to the versions allowing bases a, and 
Cj to be rationals (encoded in binary), and allowing bi and dj to be any integers (in binary). 

Let us note that PoE representation is in fact already widely used in practice. Specifically, 
because iterated multiplication may make numbers very small or very large, practitioners often 
explicitly recommend using a logarithmic transformation to represent numbers such as a^^ . . . a^" 
by: 

61 log(ai) + 62 log(a2) + ... + &„ log(a„) 



This allows multiplications and divisions to be carried out by using only addition on the co- 
efficients of the log representations. Note that such "log representations" are basically equivalent 
to PoE, as long as the logarithms are only interpreted symbolically. (Of course, one still needs to 
be able to compare such sums of logs, and we will return to this shortly.) One setting where log 
transformation is recommended in practice is for the analysis of Hidden Markov Models (HMMs) 
using the Viterbi algorithm, and for probabilistic parsing. For example, the book Durbin et. al. 
^ (section 3.6) says: 

On m,odern floating point processors we will run into numerical problems when multi- 
plying many probabilities in the Viterbi, forward, or backward algorithms. For DNA 
for instance, we might want to model genomic sequences of 100 000 bases or more. 
Assuming that the product of one emission and one transition probability is 0.1, the 
probability of the Viterbi path would then be on the order of io~i''0000_ ji^ggi computers 

would behave badly with such numbers For the Viterbi algorithm we should always 

use the logarithm of all probabilities. Since the log of a product is the sum of the logs, all 

the products are turned into sums It is more efficient to take the log of all of the 

model parameters before running the Viterbi algorithm, to avoid calling the logarithm 
function repeatedly during the dynamic programming iterations. (fSJ, pages 78-79.) 

We justify these comments from a complexity-theoretic viewpoint. In fact, we do so in the 
more general context of computing a maximum probability parse tree for a given string and given 
stochastic context-free grammar (SCFG), which generalizes the Viterbi algorithm for finite-state 
HMMs. We will observe that if deep conjectures in number theory hold then Problem 2 can be 
solved in polynomial time by employing the PoE (or log) representation, and also that the PoE 
representation can be used to obtain P-time algorithms for computing a maximum probability parse 
tree for a given string with a given SCFG, and for solving related problems. 

We first show Problem 1 is decidable in P-time using an easy iterative algorithm. Problem 2 
is much harder. We observe that if the Lang-Waldschmidt conjecture [H] holds, then Problem 2 
is decidable in P-time. Likewise, if Baker's refinement [3j of the ABC conjecture of Masser and 
Oesterle holds, then again it implies Problem 2 is decidable in P-time. The ABC conjecture is 
considered one of the central conjectures of modern number theory (see, e.g., [T2 | [5l[25 l 17]). 

Furthermore, we observe that the best currently known quantitative bound in Baker's theory 
of linear forms in logarithms, e.g., those due to Baker and Wiistholz [4J or Matveev |18l I19j . yield 
that when m and n are fixed universal constants. Problem 2 is decidable in polynomial time. 

It is well known that the ABC conjecture, and related conjectures involving explicit bounds 
for linear forms in logarithms, have important applications for effective solvability of various dio- 
phantine equations. However, to our knowledge, it has not been observed previously that these 
number-theoretic conjectures are also connected to the polynomial time solvability of basic problems 
such as the comparison of succinctly represented numbers. 

We give one application to maximum probability parsing for stochastic context-free grammars 
(SCFG). Computing the maximum probability (most likely) parse of a given string for a SCFG is 
a basic task in statistical natural language processes (NLP) [16]. Until now, it was only known 
to be computable in P-time for particular classes of SCFGs, in particular SCFGs in Chomsky 
Normal Form, and SCFGs not containing arbitrary e-rules. For general SCFGs, G, the maximum 
probability of a parse tree for even a fixed string, w, may be as small as 1/2^ , where |G| is 
the encoding size of the SCFG. Thus one cannot express such probabilities in P-time in binary 



representation. However, the maximum parse tree probabilities can be expressed concisely in PoE, 
and if we can check inequalities on such encodings of rational numbers efficiently, then we can 
compute the maximum parse tree probability in P-time. 

Specifically, we show that if Baker's refined ABC conjecture holds, or if the Lang-Waldschmidt 
conjecture holds, then given an arbitrary SCFG G, and given an arbitrary string w €z T,*, there is a 
polynomial-time algorithm that: (1) computes a (succinct representation of) a maximum probability 
parse tree for the string w and also computes (a succinct representation of) the exact maximum 
parse probability Pq^] and (2) given also a rational probability q given in binary (or in PoE), 
decides whether the maximum parse probability of w, Pq^^, satisfies Pq^ > q; (3) given also 
another string w' G S*, decides whether Pq^^ > P™^/- Furthermore, when the SCFG has a fixed 
constant number, m, of distinct probabilities labeling its rules, all of the above problems (1) - (3) 
are in P-time (in the Turing model), without assuming any number theoretic conjectures. Finally, 
we show that essentially the same algorithm can be used to approximate the maximum parsing 
probability and compute (a succinct representation of) an e-optimal parse tree for a string w and 
a SCFG G in time polynomial in the size of G,w and log(l/e), without assuming any conjectures. 

2 Deciding equality of succinct integers in P-time 

We now give a simple iterative P-time algorithm for Problem 1 (equality testing). The algorithm 
is in FigurelH It simply repeatedly computes gij = GCD(ai,Cj) for pairs a^ and Cj, and if ^jj- > 1, 
then it does the appropriate adjustments on the succinct representations. It terminates when 
gij = 1 for all i,j. The two remaining numbers are 1 if and only if the original numbers were equal. 
In more detail, at the start of the iterative algorithm, we initialize four lists of positive integers: 
a := (ai, . . . ,an), b := (6i, . . . , 6„) , c := (ci, . . . ,Cm), and d := (di, . . . ,dm)- We can assume wlog 
that lists a and c do no contain the number 1. For a list g, we use l^l to denote its length. So, 
initially, \a\ = \b\ = n and \c\ = \d\ = m. Every iteration of the algorithm will maintain the 
following invariant: \a\ = \b\ and \c\ = \d\. For a list g, we let [g] = {1, . . . , |g'|}. For a list g^ and 
for i G [g] we use gi to denote the i'th element of the list g. Given two lists of integers, g and /, 
where 1^1 = |/| = r, we use gf to denote g^ . . ■ gr ■ 

Proposition 2.1. The algorithm in Figure 1 decides Problem 1, and runs in polynomial time\}j 

Proof. For the correctness of the algorithm, we first observe that the following invariant is main- 
tained: the equality 

a^ = c'^ (1) 

holds before an iteration of the while loop if and only if it holds after an iteration of the while loop. 
To see why this is the case, suppose that GCD{ai, Cj) = gij > 1, and suppose that an iteration 
of the while loop is conducted using this i and j. Then it is clear that the new lists a, b, c, and d 
are obtained by simply factoring gij out of aj and Cj on the left and right side of equation ([T]), and 
then dividing both sides of the equation ([T]) by g^^, or by ^j'-, depending, on whether 6j > dj, or 

dj > bi, respectively. Note that when 6j = dj, dividing both sides by g^^, eliminates this power of 
gij from both sides, so we do not need to include a power of gij on either side. Since factoring out 



^It is worth noting that a more careful implementation of this simple algorithm, based on existing "factor refine- 
ment" procedures ([2] [8]), can be used to solve Problem 1 very efficiently. 



Input: 4 lists a, b, c, d, of positive integers, a Sz c not containing 1, \a\ = \b\, & \c\ = \d\. 
Task: Decide whether or not a^ = d^. 

while there is some i G [a] and j G [c] such that GCD(ai,Cj) > 1 do 
Let i € [a] and j G [c] be such that Qij = GCD{ai,Cj) > 1; 
Let Oj ^ ai/gij; Let c-,- ^ Cj/gij; 
if Oj = 1 then 

Remove Oj from list a, and remove fej from list b 
end if 
if Cj = 1 then 

Remove Cj from list c, and remove dj from list d 
end if 
if 6i > dj then 

Append integer gij to the end of list a; 
Append integer bi — dj to the end of list b; 
else if bi < dj then 

Append integer gij to the end of list c; 
Append integer dj — bi to the end of list d; 
end if 
end while 
if a and c are both the empty list then 

return EQUAL 
else 

return NOT-EQUAL 
end if 



Figure 1: Simple P-time algorithm for Problem 1 



Qij and dividing both sides of the equation by a positive value are reversible, we conclude that the 
invariant is maintained. 

Let us now argue that if the algorithm halts, i.e., if the while loop terminates, then the output 
is correct. Indeed, if the while loop terminates this means for every pair of numbers Oj and Cj in 
the lists a and c we have GCD{ai, Cj) = 1. Thus we also have GCD{a^, c'^) = 1. But in that case if 
either a 7^ 1 or c 7^ 1, then a 7^ c . Thus if the while loop halts the algorithm correctly returns 
"EQUAL" or "NOT-EQUAL" depending on whether or not a'' = c^ for the original input lists. 

The only thing left to establish is that the algorithm always halts and runs in polynomial time. 

For a list of positive integers a, not containing the number 1, let us call another list a' of positive 
integers a factored subform of a, if there is a mapping : [a'] — > [a] that maps indices r £ [a'] to 
indices (j){f) £ [o]i such that for all i G [a] 

( H «r) I Oj 

In other words, the product of all entries of a' that map to the entry Oj of a should divide Oj. 

We shall call a' a non-trivial factored subform of a if the list a' does not contain any entries 
equal to 1 either, and furthermore no permutation of the list a' is identical to the list a. 

Next, let us observe that after each iteration of the While loop, the positive integer lists a and 
c must each be non-trivial factored subforms of the respective positive integer lists a and c prior to 
that iteration of the while loop. Thus, by induction on the number of iterations, after any number 
of iterations of the while loop we must have a and c consisting of non-trivial factored subforms of 
the original input lists a and c of positive integers. 

But the while loop must therefore terminate, because by the fundamental theorem of arithmetic 
(unique prime factorization) there can not exist an unbounded sequence of lists of positive integers 

a ,a ,a ,a , . . . 

such that each one does not contain the number 1, and such that for all i € N a*^^ is a non-trivial 
factored subform of a*. 

Furthermore, for the same reason, the while loop must terminate after a number of iterations 
that is polynomial in the encoding size of the original lists a and c. Namely, the number of iterations 
can not be greater than the number of prime factors of the integers in the lists a and c. 

Furthermore, the encoding size of the lists a b, c, and d, always remains polynomial in the 
encoding size of the original input lists. This is so, firstly, because a and c always remain factored 
subforms of the original lists, respectively, and furthermore because the maximum value of the 
positive integers in lists b and d (which are always the same size as their corresponding lists a and 
c), is never more than their maximum value in the original lists b and d, respectively. 

It is then clear that each iteration can be carried out in time polynomial in the encoding size 
of the original lists, because each iteration of the while loop, when the current lists given by a, 6, 
c, and d, requires at most \a\ * \c\ computations of GCDs on pairs of integers that are no bigger 
than the maximum integer in the original lists, and as already established at any iteration the lists 
themselves are only polynomially bigger than the original lists. D 



3 Deciding inequalities between succinct integers 

We now consider the much harder Problem 2 (inequahty testing). We first recall a deep theo- 
rem due to Baker and Wiistholz on explicit quantitative bounds on linear forms in logarithmso 
Let ai,...,a„ be positive integers, none of which are equal to or to 1, let 6i,...,6„ be ar- 
bitrary integers not all equal to 0. Let e be the base of the natural logarithm, and define 
B := max{|6i|, I62I, . . . , [fenl, e}. Let h'{ai) = maxjlogaj, 1}, where log denotes the natural log- 
arithm. Let 

A(a, b) := log(a^) = h log oi + 62 log 02 + • • • + K log a„ 

For any list a of positive integers, and list b of integers, both of length n, let 

G{a,b) := e-C(")'i'K)'»'(«2)...'i'(an)iogB 

where C{n) := 18(n + l)!n"+i32"+2 log(2n). 

Theorem 3.1 ((Baker- Wiistholz, 1993 [1])). For any list a of positive integers and any list b of 
non-zero integer^, where both lists have the same length n, if A{a,b) 7^ 0, then 

\A{a,b)\ >G{a,b). 

A lower bound similar to Baker- Wiistholz 's was obtained by Waldschmidt |24) . A somewhat 
improved bound was obtained more recently by Matveev [181 [19], who showed |A(a, 6)[ > H{a,b), 
where H{a,b) := e-C"W^'{«i)'^'(«2).../i'(a„)iogi3 ^^^ (j>i^^-^ ._ 2.9(2e)2"+6(n + 2)9/2. (See Nesterenko's 

presentation [21].) The improved bound by Matveev does not have any stronger consequences for 
our complexity theoretic purposes, beyond that of Theorem 13.11 

These results constitute the current state of the art: they are the best known lower bounds 
for (homogeneous) linear forms in logarithms of rational numbers (and for more general numbers). 
Next we state an older conjecture of Lang and Waldschmidt, which would significantly strengthen 
both Theorem 13.11 and Matveev's improved bound: 

Conjecture 3.2 (Lang- Waldschmidt, 1978 (cf. [El [25])). For every e > 0, there is a C(e) > 0, 
such that for any list of positive integers a and any list of non-zero integers b, where both lists a 
and b are of length n, if A{a, b) 7^ 0, then for every e > 0: 



"^The general theorem regards logarithms of algebraic numbers. We will state it in the special case of logarithms 
of standard integers, which suffices for our purposes. 

^ Although this will be obvious to experts, let us explain why this theorem is indeed a specialization (to positive 
integer Ui's) of Baker- Wiistholz's. The Theorem in [4] allows a,;'s to be algebraic numbers, and logfli is defined 
to be any determination of the log. Clearly, when at is a rational positive integer, logfli is uniquely determined. 
Furthermore, if d is the degree of the field extension Q(ai, . . . ,a„) over Q, then in [4], the "height" function /i'(oi) 
is defined instead to be h'{ai) = max{/i(ai), (f/d)j loga^l, 1/d}, where h{ai) is the absolute logarithmic Weil height 
of fli, which we will discuss next. First note that in the setting of positive integers, a^, clearly d = 1, and thus 
h'{ai) — max{^(ai),logai, 1}. Now, one way to define the absolute logarithmic Weil height h{ai) (see [23]) is this: let 
p{x) £ Z[x] be the minimal polynomial for the algebraic number a^, suppose p{x) has degree d, let /o be the leading 
coefficient of p{x), and let qi, . . . , a^ be the complex roots of d (with repetition). Then 

d 

h{a,)=logi{\fo\l[m^^(l,\a,\))'/'') 

i = l 

Now, in the simple case where Ui is a positive integer, we see immediately that its minimal polynomial is p{x) = x~ai, 
and thus that h{ai) — loga^. Thus, for positive integers a,;, indeed h'{ai) = max{logai, f}, as given. 



{\bi\ ... |6n||oi|--- |a„|)^+' 

We next recall a central conjecture in modern number theory, namely the ABC conjecture (due 
to Masser and Oesterle), and a more recent refinement of the ABC conjecture, given by Baker. See, 
e.g., [3 [12], for background on the conjecture. For any integer m, let N{m) := (ripimP)' denote 
the product of all distinct prime numbers p that divide m (i.e., without repetition of any prime p). 

Conjecture 3.3 (ABC conjecture ^7\ [22]). For every e > 0, there is a K{e) > 0, such that for 
any positive integers a, b, c, such that a + b = c, and such that a, 6, c are relatively prime (i.e., such 
that GCD{a,b) = 1), we have: 

c < K{e)N{abcy+'. 

For any positive integer m, let u}{m) denote the number of distinct prime numbers that divide 
m. In [3j, Baker put forward several refinements of the ABC conjecture which make the "constants" 
more explicit. Among them was the following: 

Conjecture 3.4 (Baker's refinement of the ABC conjecture [3j). There are absolute constants, 
K, K' > 0, such that for any integers a, b, c that are relatively prime, and such that a + b + c = 0, 
for any e > 0, we havG-' 

max(|a|, |6|, |c|) < K'e-^'^^'''''^N{abc)^+'. 

Baker shows in [3j that Conjecture 13.41 implies the following (slightly weakened) variant of the 
Lang-Waldschmidt conjecture: 

Consequence of Conjecture 13.41 (see [3]): There is some absolute constant K" such that for 
any list of positive integers a and any list of non-zero integers b, where both lists a and b are of 
length n, if A{a,b) ^ 0, then 

\A(a b)\ > g--^"(loS'^l+l°g"2+---+loga„)(logmaxi |6i|) /g-j 

In fact. Baker further shows in ^ that a more general (p-adic) version of the bound ([3]) implies 
the ABC conjecture. Thus, as noted in [12], the ABC conjecture and such improved quantitative 
bounds on linear forms in logarithms are intimately related questions. It is perhaps worth mention- 
ing that in [6] Baker expresses doubt about the stronger Lang-Waldschmidt Conjecture. However, 
he then states a conjecture implying bound ([3]), which is strong enough for our purposes, and he 
notes that it originates from his refined ABC conjecture in [3]. 

Let us now explore the intimate connection between Problem 2, Theorems 13. 1[ and Conjectures 
13.21 13.31 and 13.41 Suppose we want to decide whether a'' > c'^. Clearly, we can first check for 
equality (in P-time). If equality does not hold, then our goal is to decide a > c , knowing a j^ c . 
Equivalently, our goal is to decide whether a^/c'^ > 1, given that a'^/c'^ ^ 1. Equivalently, we want 
to decide whether log(a^c^°') > 0, given that log(a''c~°') 7^ 0. 

So, Problem 2 is P-time equivalent to the following problem: given positive integers a = 
(ai,...,On), and integers b = {bi, . . . ,bn), both encoded in binary, decide whether A(a, 6) > 0, 
with the promise that A (a, 6) / 0. 



*It is not obvious that Coniecture 13.41 is a refinement of (i.e., implies) the standard ABC Conjecture 13.31 but as 
pointed out in [3] this is the case, the key reason being that for integers n > 1, uu{n) G 0(log(n)/ log(log(n))). Indeed, 
a little calculation shows that Coniecture 13.41 implies the ABC conjecture with K{€) G 2^ 



We can compute an approximation of the logarithmic form A(a, b), to within any given desired 
additive error, e > 0, in time polynomial in the encoding size of a and b, and in log(l/e). To see 
this, we first observe the well known fact that logarithms of integers can be approximated in P-time 
(in the standard Turing model). This is of course classic. Nevertheless we provide a proof, both 
for completeness and because most treatments of the numerical computation of logarithms only 
consider arithmetic complexity, rather than complexity in the Turing model. Recall we use log(a;) 
to denote the natural logarithm of x, and use log2(x) to denote the log base 2. 

Proposition 3.5. There is an algorithm that, given a positive integer a, encoded in binary, and 
given a positive integer j, encoded in unary, computes a rational value Va, such that 

|log(a) -Va\< 2-^ 

The algorithm, runs in time polynomial in j and log2(a) (in the Turing model). 

Proposition 13.51 is proved in the appendix. We can use it to easily prove: 

Proposition 3.6. Given as input positive integers a = {ai, . . . , a„) and integers b = (6i, . . . , 6„), 
both encoded in binary, and given a positive integer j (given in unary), there is an algorithm that 
runs in P-time (i.e., in time polynomial in j and in the encoding size of lists a and b) and outputs 
rational value Va^b, such that \A{a,b) — Va^b\ < ^■ 

Proof. Given a = (ai, . . . , a„), and given b = (6i, . . . , 6„), for each i = 1, . . . ,n we will first compute 
an approximation fa^ of log(aj) such that 

|log(aj) -faj < 



2i+^°S2{\h\)+^og2{n) 

By Proposition 13.51 we can compute such a Va in time polynomial in the encoding size of the 
input, a and b, and in j. 

Then we let ^^,6 := XlILi ^i'^ai- Finally we observe that this yields: 



log(A(a,6)) -fa,b| = \'^{bilog{ai)) - biVa^l 

i=l 
n 

< ^\bi\\log{ai) -Va^\ 

i=l 
n 

i=l 
n 

= ^2-^^ In = 2-^ 



Thus the rational number ^^,6, which can be computed in time polynomial in j, and in the 
encoding size of a and 6, is the desired additive approximation of A(a, 6). D 

Proposition 3.7. 

1. If the ABC conjecture, as refined by Baker (Conjecture \3.4\ ), holds, or if the Lang-Waldschmidt 
conjecture (Conjecture \3.2) holds, then Problem 2 is decidable in P-time. 



2. When m and n are fixed constants, Problem 2 is (unconditionally) decidahle in P-time. 

Proof. To prove both (1.) and (2.) we simply compute a sufficiently good additive approximation 
to A(a, 6), using Proposition 13.61 and we then apply the ABC conjecture (Conjecture 13. 4p and its 
consequence ([3]) or the Lang-Waldschmidt Conjecture (Conjecture 13. 2p . Likewise, to obtain (2.), 
after approximating A(a, 5) we apply the Baker- Wiistholz Theorem (Theorem 13. ip . 

In more detail, we start by proving (2.): we are given lists a and b of length n and lists c 
and d of length m, with m and n fixed constants. Let A := maxjmaxj ai,maxj Cj}. Let B := 
max{|6i|, . . . , \bn\, \di\, . . . , \dm\, e}. First we check in P-time if a = c . If this is the case, then we 
are done. So assume that a^ ^ c'^. If a^c~'^ ^ 1, i.e., log(a''c~'^) ^ 0, then by Theorem 13.11 we have 

[ log(a''c"'^)| > 2~^(niLiiogK))(n™ ilogfe))logB > 2-K{\ogAr+"\ogB ^^^ 

for some fixed constant K (that depends on n and m). But by Proposition 13. 6| when m and n are 
fixed constants, we can compute in time polynomial in the encoding size of o, b, c and d, a rational 
value fa,fe,c,d such that 

I log(a^c-'^) - Va,,,cM < 2-^(i°s^)'"""i°s^/2 (5) 

Now suppose Va,b,c,d > 0. Then if log(a^c"'^) < 0, we would have | log(a^c"'^) - Va,b,c,d\ = Va,b,c,d + 
I log(a^c~'^)| > 2~^(^°§"^)'" "logB^ ^]^g jg^g^ inequality following by ([4]). However, this contradicts the 
inequality ([5]) just given. Thus if Va,b,c,d > 0, then it follows that log(a''c^'^) > 0, i.e., that a* > c'^. 
Similarly, suppose Va^b,c,d < 0. Then if log(a^c~'^) > 0, we would have | log(a''c~'^) — Va^b,c,d\ = 
\va,b,c,d\ + I log(a*c~'^)| > 2~'^(^°s^)™ "log-B^ |-,y inequality (|4]). However, again, this contradicts ([5]). 
Thus if Va,b,c,d < then log(a*c"'^) < 0, i.e., a^ < c'^. 

To prove (1.), e.g., in the case where we use the consequences of Baker's refined ABC conjecture 
(Conjecture 13.41 in particular, the bound ([3])), exactly the same argument goes through if we instead 
compute an approximation Va,b,c,d such that 

|log(a^c-^) -^,,,,,,,1 < 2-^"'«^"-i°s(-'))+E.-i°s('=^)))(i°^^) 

which again, we know we can do in time polynomial in the encoding size of a, 6, c, and d. Sim- 
ilarly, exactly the same argument goes through for proving (1.) based on the Lang-Waldschmidt 
conjecture. 

D 

4 Maximum probability parsing for SCFGs 

We now describe an application to the problem of computing a maximum probability parse tree for 
a given string on a given arbitrary stochastic context-free grammar (SCFG). For particular classes 
of grammars (e.g., those in Chomsky Normal Form) there are well known dynamic programming 
algorithms for this (based on the CKY parsing algorithm) , which generalize the well-known Viterbi 
algorithm for HMMs |16] . For arbitrary SCFGs with e-rules, the problem is more involved, and 
there do not exist good complexity bounds in the literature. 

Definitions and Background for SCFGs. An SCFG G = {V,'E,R,S,p) consists of a finite set 
V of nonterminals, a start nonterminal S £ V, a finite set S of alphabet (terminal) symbols, and a 
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finite list of rules, R CV x (VUT,)* , where each rule r G i? is a pair (A, 7), which we usually denote 
by A — 7> 7, where A gV and 7 G (y U S)*. Finally p : R ^^ M'^ maps each rule r € i? to a positive 
probability, p{r) > 0. For computational purposes we assume as usual that the rule probabilities 
are rational numbers, given as the ratios of two integers written in binary. We often denote a rule 

'0(7'} 

r = [A ^- ^) together with its probability by writing A — -t- 7. Note that we allow 7 € (^ U S)* to 
possibly be the empty string, denoted by e. A rule of the form A— s-e is called an e-rule. For a rule 
r = {A ^ 7), we let left(r) := A and right(r) := 7. We let Ra = {r e R \ left(r) = A}. For 

AeV, let p{A) = ErGRA^(^)- ^ ^^^^ ™^s* ^^*isfy ^^^* ^^ ^ ^' ^(^) - ^- ^^ S*^^*^ i^ ^^^^^^ 
proper if VA G y, p(A) = 1. 

For a SCFG, G, strings q,/3 G (y U S)*, and tt = ri . . . r^ G i?*, we write q => /3 if the 
leftmost derivation starting from a, and applying the sequence vr of rules, derives /3. We define the 
probability p(q => /3) of the derivation to bep(a ^ /3) = nj=iP('^fc) if a =^ /?, and let p(a ^ /3) = 
otherwise, li A ^ w for A ^V and tf G S*, we say that vr is a complete derivation from A and its 
yield is y('/r) = w. There is a natural one-to-one correspondence between the complete (leftmost) 
derivations of w starting at A and the parse trees of w rooted at A, and this correspondence 
preserves probabilities. Recall that a parse tree is a rooted, ordered finite tree, where every leaf v is 
labeled with a symbol l{v) G SU{e}, every internal (non-leaf) node v is labeled with a nonterminal 
l{v) G V and has an associated rule r{v) G Ruv) whose right-hand side is the concatenation of the 
labels of the children of v. The yield of the parse tree is the concatenation of the labels of the 
leaves. The probability of the parse tree is the product of the probabilities of the rules associated 
with its internal nodes. 

For a non-terminal A and a string w, the maximum parse tree probability for w, starting at 
A, is defined to be p^^^ = maxTrg/?* p(^ => w). The total probability of generating w starting 

at A is defined by pA,w = Z^ttgr* P(^ ^ ^)- Given an SCFG, G = {V,T,, R, S,p), and given a 
string w (zTi* , the goal of maximum probability parsing is to compute p™^ and also to compute (a 

representation of) a maximum probability parse tree, i.e., arg max^gR* p(S' =^ w). In the following 
we will also use Pq^^ to denote the maximum probability p™^ of a parse tree for w from the start 
nonterminal S" of G. 

For general SCFGs, G, that have e-rules, the maximum probability parse tree of a string w (even 
just the string w = e) may have exponential size in the size of the grammar, and the corresponding 
maximum probability may need an exponential number of bits when written as the ratio of two 
integers. The probability can be specified more compactly in PoE notation. For any SCFG G 
and string w, polynomial size (in |G| and \w\) always suffices to represent the maximum parsing 
probability in PoE notation: the bases of the expression are (a subset of) the given rule probabilities 
of G and the exponents are the number of occurrences of the rules in the optimal parse tree; the 
numbers of occurrences are at most exponential and thus can be written in a polynomial number 
of bits. 

The optimal parse tree can be specified more compactly using a DAG representation that 
identifies isomorphic subtrees. Formally, a parse DAG is a rooted, ordered DAG D (i.e. the DAG 
has a singe source, and the children of every node are ordered), where every sink (leaf) node is 
labeled with a symbol l{v) G S U {e}, every other (non-sink) node v is labeled with a nonterminal 
l{v) G V and has an associated rule r{v) G -R/(„) whose right-hand side is the concatenation of 
the labels of the children of v. For every node v we can define inductively in a bottom-up order 
the yield and probability of the subDAG D[v] rooted at u: If v is a leaf then the yield is l{v) 
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and the probability is 1. If f is an internal node, then the yield of D[v\ is the concatenation of 
the yields of the children of f , and the probability of D[v\ is the product of the probability of the 
rule r{v) and the probabilities of the subDAGs rooted at the children of v. The parse DAG D 
corresponds to a parse tree T obtained by replicating nodes so that every node other than the root 
has a unique parent; the yield and probability of the DAG are equal to the yield and probability 
of the corresponding tree. As we shall see, for every SCFG G and string w in the language L{G) 
of G (i.e., that has non-zero probability), there is a maximum probability parse DAG for w of size 
polynomial in |G| and \w\i\ Our goal is to construct such a maximum probability parse DAG. 

We will say that an SCFG, G = {V,T,,R,S,p) is in Simple Normal Form, (SNF) if every non- 
terminal A (^V belongs to one of the following three types: 

1. type L: every rule r E Ra, has the form A > B. 

2. type Q: there is a single rule in R^: A — ?> BG, for some B,G (zV. 

3. type T: there is a single rule in R^: either A —^ e, or A — > a for some a G S. 

An SCFG is said to be in Chomsky Normal Form (CNF) if it satisfies the following conditions: 

• The grammar does not contain any e-rule except possibly for a rule S — )• e associated with 
the start nonterminal S; if it does contain such a rule, then S does not appear on the right 
hand side of any rule in the grammar. 

• Every rule, other than S — )• e, is either of the form A — )• BG, or of the form A ^ a where A, 
B, and G are nonterminals in V and a S S is a terminal symbol. 

We shall show below that every SCFG can be converted efficiently in P-time, to an equivalent 
SCFG that is in SNF form, where the equivalence also entails a probability-preserving bijection 
between parse trees of strings in the two grammars. 

Unlike SNF form, conversion of even an ordinary context-free grammar to CNF form does not 
in general preserve a bijection between parse trees of the original grammar and those of the CNF 
grammar. This is so even when we ignore the additional issue of needing to preserve the probability 
of corresponding (e.g., maximum probability) parse trees for a given string, in the setting of SCFGs. 

Furthermore, as shown in [lOj . it is not possible in general to convert an arbitrary SCFG to 
one in CNF form which preserves even just the probabilities of generating every terminal string, 
without having to introduce irrational rule probabilities even when all rule probabilities of the 
original SCFG were rational. See [10] for a considerably involved P-time algorithm that converts 
an SCFG to an approximately equivalent CNF form SCFG. Here "approximately equivalent" only 
refers to preservation of the probability of generating strings up to a given length, and not to 
preservation of a correspondence between parse trees (which is not doable in general), and thus 
such a conversion is not suitable when our goal is to compute, e.g., a maximum probability parse 
tree for a given string on a given SCFG. 

Lemma 4.1. (Lemma B.5 of UO^ ) Any SCFG, G = {V,T,,S,R,p), with rational rule probabilities, 
can be converted in linear time to a SCFG, G' = {V',Ti,S,R' ,p'), in SNF form, such that V C V' , 



^The succinctness of the DAG representation is essential only for derivations of e from nonterminals. In particular, 
for every SCFG G and w G L{G), there is a maximum probability parse DAG for w, of size polynomial in \G\ and 
\w\, which consists of a tree, some of whose leaves are replaced by DAGs with yield e. 
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such that G' has the same set of rational values as rule probabilities (possibly with some additional 
rules having probability 1), and such that for every nonterminal A € V and string w € T,* , there 
is a probability preserving bisection between parse trees of w rooted at A in G and parse trees of w 
rooted at A in G' . Moreover, given a parse tree for w rooted at A in G we can easily recover the 
corresponding parse tree in G' , and vice versa. 

The proof is directly analogous to related proofs in pU)j , and simply involves adding new auxiliary 
nonterminals and new auxiliary rules, each having probability 1, to suitably "abbreviate" the 
sequences of symbols, 7, that appear on the right hand side (RHS) of rules A — )• 7, whenever 
I7I > 3. We do this repeatedly until for all such RHSs, 7, we have I7I < 2. To obtain the normal 
form, we may then also need to introduce nonterminals that generate a single terminal symbol with 
probability 1. The resulting SCFG is guaranteed to be at most polynomially larger (in fact, only 
linearly larger) if we are careful. 

We give an efficient algorithm operating in the unit-cost arithmetic RAM model with only multi- 
plication ({*}) operations and comparisons permitted, for computing the maximum likelihood parse 
tree of a given string. 

Theorem 4.2. Given any SCFG, G, in SNF form, with rational rule probabilities^ given a string 
tu S S*, there is an algorithm that computes the maximum parse tree probability Pq^^, and if 
Pg^w ^ '-'' ^^c'^ ^^ ^^-50 computes a succinct DAG representation of a maximum probability parse 
tree, t™^^, for w. 

The algorithm runs in polynomial time in the unit-cost arithmetic RAM model of computation 
with only multiplication operations (and comparisons) allowed. And thus, it is P-time Turing 
reducible to Problem 2: inequality comparison of integers in succinct PoE representation. 

Proof. Given the SCFG, G, we first compute, for every non-terminal A, the maximum probability 
pmax q£ g^j^y f^j^ji^g parse tree that starts at nonterminal A and generates the empty string e. 

We do this using a variant of Dijkstra's shortest path algorithm, due to Knuth [14], which works 
not on finite graphs but on weighted context-free grammars, for generating a parse tree with smallest 
sum total weight (as well as other classes of weight functions). See the survey on probabilistic 
parsing by Nederhof and Satta [20] (their Figure 5) where they nicely describe Knuth's algorithm 
applied to the specific problem of computing, for any given SCFG, the maximum probability of 
any finite parse tree. We follow Nederhof and Satta's description ( |20j ) . 

Given the original SCFG, G, and given a nonterminal A, if we are interested in computing p™*^^, 
we first remove from G all rules of the form i? — t- 7, for any nonterminal B, and string 7 where 7 
contains at least one terminal symbol u G S in it, because such rules can't possibly help us generate 
the empty string e. 

Let us call the resulting SCFG after these removals G'Y\ 

Every finite parse tree of the remaining SCFG, G', must generate the empty string, because no 
other terminal symbols are left. Moreover, there is a one-to-one probability preserving correspon- 
dence between parse trees of G generating e and parse trees of G\ starting at any nonterminal A 
in the two SCFGs, respectively. 



We can even allow irrational rule probabilities, since for this we assume a unit-cost RAM model of computation. 
^Some non-terminals Ai in G' may now not have a set of rules Ri associated with them whose probabilities 
sum to 1, because we have removed some rules. This causes no problem in our computations: we are interested in 
probabilities of parse trees that don't involve the removed rules, and these remain the same as in G. 
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Z?:=SU{e}; 

Initialize: Pq, ^^^ := 0, for every nonterminal A] 
Define p^, ^^^ := 1 for all a G D. 

while {f\={B\B (^D A 3r = {B ^ Xi . . . X^) where Xi, . . . , X^ G D} / 0) do 
for all 5 G F do 

q{B) := maX|,=(B^jf^...x,)tXi,...,X,-GD}P(^)PG',max • • -^P^'max 

m(5) := argmax{,=(B^Xi...Xfe)|Xi,...,XfeeD}P(r-)PG',max • • -^c'Vax- 
end for 

Let C := argmaxcgi? (/(C); 

PG',max — liC); 
D:= DU{C}; 

*G',max ■= ^(C); 

(where m{C) only describes the root rule of tree tg, ^^^, and t^, ^^^ can thus be 
viewed as being encoded succinctly as a straight-line program, i.e., a DAG.) 
end while 
For every nonterminal A: output p^, ^^^, and if p^, ^^^ > then also output t^, ^^^-j 

Figure 2: Knuth's variant of Dijkstra's algorithm, for computing a globally maximum probability 
parse tree starting from every nonterminal A (see j20j). 

Therefore, computing p™^^ is equivalent to computing the maximum probability of any finite 
parse tree starting at A in G'. Let us denote this probability by Pq, ^^^. 

Knuth's variant of Dijkstra's algorithm is described in Figure [2] which is taken from [20] (their 
Figure 5). It computes Pq, ^^^ for every nonterminal i? of a given SCFG G', until it has computed 
Pq, ^^^, or else discovered that Pq, ^^^ = 0. It does so by iteratively finding a non-terminal B for 
which Pq, ^^^ has not yet been computed, and such that, among all such nonterminals, B gives rise 
to the maximum product probability of a rule B —^ 'j times the maximum parse tree probabilities 
Pq, ^^^ for all nonterminal occurrences B' in 7. 

It is not difficult to check that this algorithm works correctly, for the same reason that Dijkstra's 
algorithm works correctly. It computes Pq, ^^^ = p™*^^ and furthermore also computes a DAG 
representation (straight-line program representation), of a maximum probability parse tree Iq ^^^ ^ 
starting at A, for every nonterminal A of both G' and G. (Note that this algorithm does not 
require the SCFG G to be in SNF form. We will exploit SNF form only later, in the final dynamic 
programming step of the algorithm.) 

Note, furthermore, that the algorithm requires at most a polynomial number of arithmetic 
operations, specifically, it requires multiplications only (and this fact will be important for us later), 
as well as comparison operations (for allowing us to take the maximum over a finite set of values). 

Thus, the algorithm clearly runs in polynomial time in the unit-cost arithmetic RAM model of 
computation. 

However, note that the probabilities p^^^ can clearly be extremely small positive numbers in 

the worst case (as small as 1/2^ ), and thus we can not encode them in standard binary notation 
in polynomial size. However, since only multiplications are being used over a set of input rule 
probabilities, p{r), we can encode these probabilities succinctly in PoE, by specifying (in binary) 
the non-negative integer power of each p(r). We can compute these numbers with the same number 
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of arithmetic operations over PoE notation, as long as we can compare PoE numbers efficiently. 

Having computed p^^^ for every nonterminal A of G, we next want to use these quantities to 
compute p™^ for an arbitrary string w GT,*. 

In the next step, we will use another application of Dijkstra's algorithm, this time on a finite 
graph, to compute, for every pair of nonterminals A,B, the maximum probability, p^^xB' ^^^* 
a derivation of G starting at nonterminal A eventually yields the single nonterminal symbol B 
as its yield. We do this as follows: construct an edge-labeled directed graph, H = {V,E), with 
edges labeled by probabilities, with the nonterminals F of G as its nodes, and such that, for each 

rule A ^ B, we have the corresponding edge {A,p,B) E E. For each rule A — > BC such that 
Pq^ > 0, we place the edge {A,p',B) € E, where p' = p{r) *Pq^^. Similarly if p^^^ > 0, then we 
put {A,p' , C) G E where p' = p{r) * p^^. For every pair of nodes A, B, if there are multiple edges 
(A,p,B) and {A,p',B) in E, we only keep one edge with the maximum probability. 

We then run Dijkstra's algorithm from every node A to compute the maximum probability 
path from A to every other node B in H, and we let pj^^^ B denote the product of the probabilities 
along that path. Note that in this case again, Dijkstra's algorithm only requires polynomially 
many multiplication operations (no additions) and comparisons, and thus runs in P-time in the 
unit-cost RAM model of computation, whatever the encoding of the probabilities labeling H is. 
While running Dijkstra's algorithm we can also retain the maximum probability paths themselves, 
and combine them with the already computed maximum probability parse trees t^ax e encountered 
along the path, in order to compute a DAG representation of a maximum probability parse tree 
t™'^ in G with root A and with "yield" B. The following claim is not difficult to prove: 
Claim: This algorithm correctly computes the maximum probability p^^x b ^f ^ parse tree in G 
with root A and yield B, and also computes a succinct DAG representation t™^ of such a parse 
tree. 

We are finally ready to iteratively compute the probabilities p™*^ for an arbitrary string w G S'*'. 
Let w = wi . . .Wnhe the given string, with Wi € S. For z G {1, . . . ,n}, and j G {1, . . . ,n — i-|-l}, let 
us define pj^^x « j ^° ^^ ^^^ maximum probability of any parse tree rooted in A which generates the 
string Wi . . . Wi^j-i. We shall now see how to compute these Pmax.ij's via dynamic programming. 

Recall that we are assuming that G is in SNF form. For j = 1, let q^^^ n '■= ™ax p(^) p{r). 

r={A -i> Wi) 

In other words, (?^ax i i ^^ simply the maximum probability of any rule associated with A that 
immediately generates the terminal symbol Wi. By default, if there is no such rule then g^^x n '■= ^■ 
For j > 1, let g^ax j j denote the maximum probability of a parse tree rooted in A which generates 
the string Wi . . . "Wj+j-i, and furthermore where the rule used at the root of the parse tree is of the 
form A —7- BC, where the yield of both the children B and C are not e. 

Figure [3] describes a dynamic programming algorithm for computing both p^^x i j '^ ^'^d Q^ax i /^ 
for all i and j, based on mutual recurrence relations. The algorithm assumes that for all nonter- 
minals A and B the values p^ax e ^^d pj^^x B have already been computed, as described in the 
previous steps. 

It is not difficult to show that the algorithm is correct. Note that p^ax.! n — Pmaxw^ ^^d thus 
the algorithm computes the maximum probability of a parse tree for a string w. 

Note furthermore that the algorithm runs in polynomially many steps, only requiring unit-cost 
multiplication operations and comparison operations. Furthermore, we can easily augment the 
algorithm so that, in addition to just computing Pmaxu; i* also computes a DAG (a straight-line 
program) representation of a maximum probability parse tree imax.u; fo'^ "^ rooted at A, while still 
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Initialization: 

for all nonterminals A ^V and i E {1, . . . , n} do 

"max,i,l 



^ —max (,) p(r); 



r=(yl ^ Wi) 

end for 

for all nonterminals A (^V, and i € {1, . . . , n} do 

^ A B 

end for 

Main Loop: 

for J := 2, . . . ,n do 

for all nonterminals A ^ V and iE{l,...,n}do 
if (i + j — 1 < n) then 

'?max,i,j ""^^"^{mGllvJ — 1} & rule r=(A^-_BC)} ^v'"' ' Pniax,i,m ' Pina,x,i+rn,j—m 

end if 
end for 

for all nonterminals A &V and i E {1, . . . , n} do 

if (i + j — 1 < n) then 

A A B 

Pm3.x,iJ = maXfigy Pms,x,B ' 1max,i,j 

end if 
end for 
end for 

Figure 3: Dynamic program for computing maximum probability parse of given string. 

requiring only polynomially many operations in total. D 

Corollary 4.3. Given any SCFG, G, with rational rule probabilities, and given a string, w £ S*, 
where S is the terminal alphabet of G: 

A. If either the Lang-Waldschmidt Conjecture, Conjecture \3.S^. or Baker's version of the ABG 
conjecture, Conjecture\3.4\ hold. 



B. or else, if the number of distinct probabilities labeling the rules of G is bounded by a fixed 
constant, c, 

then the following all hold: 

1. There is a P-time algorithm, in the standard Turing model of computation, for computing 
the exact probability Pq^^ in succinct product of exponentials notation (PoE), and there is a 
P-time algorithm for computing, if Pq^^ > 0, a maximum probability parse tree f^^^ where 
t^^^ is represented succinctly as a DAG (straight-line program). 



2. Given, additionally, a rational probability q G (0,1], encoded in binary^, there is a P-time 
algorithm in the standard Turing model of computation that decides whether p^^^ > q. 

3. Likewise, given additionally another string w' £ T,* , there is a P-time algorithm (in the Turing 
model), that decides whether Pq^^ > Pg^w' ■ 



*If we are assuming (A.), then q can even be given in PoE. If we are instead assuming (B.), then q can also be 
given in PoE by «*", but with a — (ai, . . . , a^), where k < c', for some fixed constant c'. 
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Proof. The claims follow easily from the proof of Theorem 14.21 Specifically, recall that the unit- 
cost RAM algorithms in the proof of Theorem 14.21 only involve multiplication of rational numbers 
starting with base numbers that are rule probabilities of the given SCFG G, as well as taking the 
maximum over such numbers (which we can carry out by comparisons). We can therefore maintain 
all the computed numbers in PoE format, and based on the postulated conditions it follows from 
Proposition 13.71 that we can carry out all the necessary comparisons in P-time, yielding a P-time 
algorithm overall which computes the relevant probabilities in PoE notation, and which can compare 
two such probabilities. D 

Furthermore, without any conjectures, essentially the same algorithm can be used to approxi- 
mate the maximum parsing probability of a string: 

Corollary 4.4. Given any SCFG, G, with rational rule probabilities, given a string, w €T,* , where 
S is the terminal alphabet of G, and given rational e > 0, there is an algorithm (in the standard 
Turing model) that runs in time polynomial in \G\, \w\ and log(l/e), which determines whether 
w € L{G) and if so, computes a value v such that \ log2(p™'^) — "uj < e and the DAG representation 
of a parse tree of w that has probability > (1 — e)pQ^. 



„max 



Proof. Assume, wlog, that we first put the SCFG in SNF form. Let n be the number of nonterminals 
of the SNF grammar. Let us first estimate the size of the PoE expressions for the probabilities that 
are computed by the algorithm of Theorem 14.21 It is easy to show that the maximum probability 
PaT °^ derivation of e from a nonterminal A is given by a PoE expression whose bases are rule 
probabilities and where the sum of the exponents is less than 2". This can be shown by an easy 
induction on the iterations of Knuth's algorithm in Fig. [2l where the inductive claim is that the 
sum of the exponents for a nonterminal that is added to D in the ith iteration is at most 2* — 1 . 

Then we construct a directed graph H and use Dijkstra's algorithm to compute probabilities 
Pa^§. The edges of H have probabilities of the form p{r) ■ p^^^, and the optimal path between any 
two nodes is simple, thus it has length at most n — 1. Therefore, each probability p^^^ iii PoE 
notation has sum of exponents at most (n — 1) • 2". 

If we consider then the algorithm of Fig. [3l it is easy to show by induction on j that a probability 
Pma,x,i,j i^ PoE notation has sum of exponents at most (2j — l)((n — 1)2"' + 1). Thus, the PoE 
expression for the final probabilities, as well as all the probabilities during the computation, have 
sum of exponents less than 2n^2". Or in other words, the logarithms of the computed probabilities 
are (positive) linear combinations of the logarithms of the rule probabilities where the sum of the 
coefficients is less than 2n-^2"'. 

Our algorithm for the approximate computation of the maximum parsing probability starts 
by computing approximately the logarithms of the rule probabilities to a precision of k = \2n + 
log(l/e)] bits, i.e. within additive error 2^^ < e/2^" < e/(2n^2"). Then we apply the Algorithm of 
Theorem 14.21 doing all the computations (exactly) in logarithms, using the approximate values for 
the logarithms of the rule probabilities. Note that every computed quantity is a linear combination 
of the logarithms of rule probabilities. The cumulative additive error that is introduced from the 
initial approximation to the logarithms of the rule probabilities is at most 2n^2"' • 2^ < e. 

The algorithm computes at the same time the DAG representation of a parse tree T, whose 
probability satisfies log2(p(T)) > log^p^"^^ - e. Thus, p{T) > p^'^^/2'' > (1 - e)p^^^. D 

Acknowledgements. Thanks to Howard Karloff and Igor Shparlinski for comments on an earlier 
draft. In particular, thanks to Igor for pointing out references [21 18] on efficient factor refinement. 
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A Appendix 

A.l Proof of Proposition 11.11 

Proposition [TTTl There is a simple P-time translation from a given number represented in PoE to 
the same number represented as an arithmetic circuit over {*,/} with integer inputs (represented 
in binary). Likewise, there is a simple P-time translation in the other direction. 

Proof. Given a number in PoE, observe first that, for integers Cj and 6j, we can use Horner's rule to 
represent Oj' by an arithmetic circuit which takes Oj as an input, and performs repeated squaring 
and multiplication by Oj to obtain a circuit evaluating to a^* whose depth and number of gates 
is bounded by log2(5j). We can then obviously compute a linear size circuit yielding the product 

In the other direction, given an arithmetic circuit C over {*,/}, with positive integer inputs 
ai, . . . , a„, we show by induction on the depth of the circuit that the numbers computed by a gate at 
depth d can be translated to a PoE a^, where the base numbers are the Oj's and where the exponents 
hi have absolute value at most 2 , and can thus be computed in linear time, given C as input. The 
claim is obvious in the base case, for input gates of the circuit C, which are the Oj's. Inductively, 
assume that for some gate gi = gj gk we already have PoE representations for gj and gk, given 
by a^^^> and a^^^' respectively, where G {*,/}• Assume that d = in.ayi{depth{gi),depth{gj^)}. By 
induction we can assume each component of the lists (vectors) b{j) and b{k) is at most 2 . 

Now if = " * " , then clearly the value computed by gate gi can be represented by a''^*-* , where 
b{i) = b{j) + b{k) denotes component- wise addition of the two vectors of integers b{j) and b(k). 
Likewise, if = "/", then clearly gi can be represented by a^"^' where b{i) = b{j) — b{k). In both 
cases, for every component m G {1, . . . ,n}, we have |6(«)m| < 2'^'^^, since we at most double the 
absolute value by adding or subtracting two numbers with absolute value < 2"^. Since g^ has depth 
d + 1, we are done. D 

A. 2 Proof of Proposition 13.51 

Recall we use log(a;) to denote the natural logarithm of x, and use log2(x) to denote the log base 2. 
Proposition 13.51 There is an algorithm that, given a positive integer a, encoded in binary, and 
given a positive integer j, encoded in unary, computes a rational value Va, such that 

|log(a) -Va\< 2"^' 

The algorithm runs in time polynomial in j and log2(o) (in the Turing model). 

Proof. Recall the standard power series for the natural logarithm of x + 1, which holds for any x 
in the range —1 < a; < 1: 

OO j 

log(x + 1) = X - x'^/2 + xV3 - xV4 + . . . = y (-1)^+1— (6) 

Consider any positive integer a. We can assume a > 1, wlog, because otherwise log(l) = 0. By 
examining the binary encoding of a, we can easily determine a positive integer m > 0, such that 
2"" < a < 2™+i. Note that 
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log(a) = log(a/2'^+i) + log(2) * (m + 1) 

Thus, to compute log(a) "to within sufficient accuracy" , it suffices to compute both log(a/2'""'"-^) 
and log(2), "to within sufficient accuracy". But note that since a > 2™, we have that 1/2 < 
a/2^+1 < 1. Letting y := (a/2^+1 - 1), since -1/2 < y < 0, we have y + 1 = a/2'^+^ is in the 
convergent region of the series ([6]), so that: 



oo 



log(a/2™+') = V(-l)-+'!L 



i=l 



Now, note that since -1/2 < y < 0, we have | E»=fc+i(-l)'^^Tl < T^Zk+iC^/^Y < (1/2*^)- 
Thus, to compute an approximant v' for log(a/2™'+-^) such that | log (a/2''" "^^) — v'\ < (1/2'^), all we 

have to do is to compute the first k + 1 terms of the series X^i^i(— l)*"*"^^- This we can easily do 
in time polynomial in k and in the encoding size of a, by just carrying out the arithmetic. (The 
encoding size of the numbers calculated this way will not get more than polynomially large in the 
encoding size of a and k: roughly each term has encoding size at most k * size{a), so the encoding 
size of the sum is at most log2(/i:) * k * size{a).) 

Next, we need to compute an approximation of log(2). To do this, we use a well-known al- 
ternative series derivable from ([6]): we have log(2) = — log(l/2) = — log(l H — (1/2)) = X^^i ^• 
Again, we have Y^^^j^i ^ < 1/2^^. Thus, we can compute an approximant v" of log(2), such that 
\v" — log(2)| < 2~-^, by just computing the first j + 1 terms of the series X^^i ■^■ 

We would like to combine a suitable approximation v' of log(a/2™^^) and a suitable approxi- 
mation v" of log(2) to get an approximation of log(a) = log(a/2™'+^) + log(2)(m + 1), to within a 
desired additive error 2~^ in time polynomial in the encoding size of a and in j. 

Weonlyneedtoobservethatif |log(a/2'"+i)-z;'| < 2-(j+i) and |log(2)-t!"| < 2-(-'+i)/(m+l), 
and if we let Va '■= {v' — v"{m + 1)), then 

|log(a)-?;,| = |(log(a/2™+i) + log(2)(m + l)) -(?;' + ?;" (m + l))| 

= |(log(a/2™+i) - v') + (m + l)(log(2) - v")\ 

< |(log(a/2'™+i) -v')\ + {m + 1)| log(2) - v"\ 

< 2-(^'+i) + 2-(^'+i) = 2-^ 

Thus, given positive integer a in binary, we can compute a 2^-'-approximation, Va, of log(a) in 
time polynomial in the encoding size of a and in j, in the Turing model of computation. D 
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